The World Happiness Report is a landmark survey of the state of global happiness. The World Happiness Report 2018, ranks 156 countries by their happiness levels. It is done by a group of independent experts using the data provided by the yearly Gallup World Poll. The happiness index is created by many major areas. I Select these majors in my analysis to study the effect of it in the happiness score.
Country (String) : Name of the Country.
Score (Float): National AVG response to the questions: “Please imagine a ladder, with the worst possible life as a 0 and the best possible life as a 10. On which step of the ladder would you say you personally feel you stand at this time?”. This measure is also known as the Candrill Life Ladder.
GDP per Capita (Float): Natural log of GDP per Capita GDP : the total value of all the goods and services produced by a country in a particular year, divided by the number of people living there.
Social Support (Float) National AVG of the responses to the GWP question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”
Healthy Life Expectancy (Float) Life Expectancy AVG based on data by the WHO and the WDI Life Expectancy : The average number of years that a person can expect to live in “full health” by taking into account years lived in less than full health due to disease and/or injury.
Positive Affect (Float) AVG of three positive affect measures in GWP: happiness, laugh and enjoyment.
Negative Affect (Float) AVG of three negative affect measures in GWP: worry, sadness and anger.
Continent (String) Continent of Country.
Q1) How is the performance of the world happiness score over the years?
Q2) What are the factors affect the happiness Score?
Q3) To what extend effect these factors on the happiness Score?
Q4) How is the performance of happiness score in world continents over the years?
Q5) How’s the distribution of the most factor effect on happiness score over the world continents?
## Score
## Min. :2.662
## 1st Qu.:4.575
## Median :5.326
## Mean :5.432
## 3rd Qu.:6.269
## Max. :7.971
The happiness score is normally distributed, The range of the data is between 2.6 and 7.9, The mean, median and mode are almost equal, Mean = 5.4 and the median = 5.3, Mode = 5. There are no outliers in the Score.
## GDP.per.capita
## Min. : 6.377
## 1st Qu.: 8.328
## Median : 9.412
## Mean : 9.228
## 3rd Qu.:10.188
## Max. :11.770
The distribution of GDP per capita is left-skewed, That mean the most of the data is a large number. The range of the data is between 6.3 and 11.8, the Mean = 9.2 and the median = 9.4, Mode = 10.8. There are no outliers in GDP per capita.
## Social.support
## Min. :0.2902
## 1st Qu.:0.7452
## Median :0.8308
## Mean :0.8086
## 3rd Qu.:0.9041
## Max. :0.9873
The distribution of Social support is also left-skewed, The range of the data is between 0.2 and 0.9, Most of the data is near 1. The Mean = 0.80 and the median = 0.83, Mode = 10.8. We can see from the box plot there are outliers between 0.3 and 0.5.
## Healthy.life.expectancy.at.birth
## Min. :39.35
## 1st Qu.:57.20
## Median :64.04
## Mean :62.48
## 3rd Qu.:68.29
## Max. :76.54
The distribution of Healthy life expectancy at birth left-skewed, The range of the data is between 39.35 and 76.54, Most of the data is between 60 and 70. Which is mean the high number of countries have a good health score.The Mean = 62.48 and the median = 64.04, Mode = 65. We can see from the box plot there are outliers near 40.
## Positive.affect
## Min. :0.3625
## 1st Qu.:0.6183
## Median :0.7155
## Mean :0.7073
## 3rd Qu.:0.7986
## Max. :0.9436
The positive affect distribution is bimodal. This means that there is not a single data value that occurs with the highest frequency, it has 2 modes one at 0.64 and another at 0.85. The mean = 0.70 and the median = 0.71. The range in the data between 0.37 and 0.94. This distribution is surprising me I thought it will be a left skewed. There are no outliers in the positive affect.
## Negative.affect
## Min. :0.09549
## 1st Qu.:0.20466
## Median :0.25394
## Mean :0.26443
## 3rd Qu.:0.31382
## Max. :0.70459
The negative affect is a right-skewed distribution. This distrepution meet my expectation because it makes seance the sadness, worry, and anger, affect in the happiness. The mean is equal to 0.26, the median is equal 0.25, and the mode is equal to 0.23. There are outliers above 0.48.
## Continent
## Africa :320
## Asia :398
## Europe :349
## North America:117
## Oceania : 18
## South America: 99
From the distribution of the data based on the Continent is highest in Asia by 398 rows this comes from a large number of the country in Asia. Europe comes after by 349 rows, Africa with 320 rows, North America with 117 rows, South America 99 and Oceania with 18 rows only. This rows not depend on the number of the countryies in each continent there are multiple rows represent the data of each year.
The dataset contains 1301 observations of 9 features.
The mean feature is happiness score with the value between 0 and 10.
I mostly on comparing the factors based on the continents because I want to understand the behavior of happiness around the world.
Yes, I created a new column named continent as the categorical variable. I created it from another dataset the match each country with the right continent.
The first step in my project is cleaning the data to be ready to investigate.
In this section we will answer this questions : Q1) How is the performance of the world happiness score over the years? Q2) What are the factors affect the happiness Score? Q3) To what extend effect these factors on the happiness Score?
In general the happiness score has highest value at 2017 and lowest value at 2014. From 2008 to 2010 is increased by approximately 0.08 point. From 2010 to 2011 it rapidly dropped by almost 0.05 point. The possible reasons for that is the amount of the data so let us check about it.
## # A tibble: 10 x 3
## Year Score n
## <int> <dbl> <int>
## 1 2008 5.42 108
## 2 2009 5.47 112
## 3 2010 5.51 119
## 4 2011 5.41 143
## 5 2012 5.44 139
## 6 2013 5.40 132
## 7 2014 5.36 138
## 8 2015 5.40 138
## 9 2016 5.39 139
## 10 2017 5.53 133
It’s interesting when you using helpful packages in R. This packages helps you to reduce the code and give you good results. In this section, I use the correlation matrix to find the correlation coefficient value for all factor. Let’s investigate the matrix together.
The table below discusses all strength matches based on correlation coefficient values : I used shortcuts to simplify table
| - | Score | GDP | Social | Healthy | Positive | Negative |
|---|---|---|---|---|---|---|
| Score | - | |||||
| GDP | +Strong | - | ||||
| Social | +Strong | +Strong | - | |||
| Healthy | +Strong | +Strong | +Strong | - | ||
| Positive | +Strong | +Weak | +Weak | +Weak | - | |
| Negative | -Weak | -Weak | -Weak | -Weak | -Weak | - |
That’s great we have a good overview let’s see how’s the scatter plot of happiness score and other factors look like.
From the chart, we see the relationship between Score and GDP per cabita is strong positive.
From the chart, we see the relationship between Score and Social support is strong positive.
From the chart, we see the relationship between Score and Healthy life expectancy at birth is strong positive.
From the chart, we see the relationship between Score and Positive affect is strong positive but less than outhers.
From the chart, we see the relationship between Score and Negative affect is weak negative.
I set y range from 0 to 10 to undestand hows continents look on the Score ladder.
In general, Europ has the highest score and Africa has the lowest score. Asia has the largest range and Oceania has the lowest range. We should take into consideration the number of countries on each continent.
Africa: The range of score between 2.8 to 5.8, the distribution is almost normal, median equal 4.4, there are outliers above 6.
Asia: The score has a wide range between 3 and 7.5, the distribution of the box is right-skewed, median equal 5.1, there are outliers below 3.
Europ: The score has a wide range between 3.9 and 8, the distribution of the box is right-skewed, median equal 6, there are no outliers.
North America: The score has a wide range between 3.5 and 7.8, the distribution of the box is left-skewed, median equal 6.5, there are no outliers.
South America: The score has ranged between 5 and 7.5, the distribution of the box is normal, median equal 6.4, there are outliers around 4.
From this chart, we can’t read Oceania so I create a new one zoom in it by decrease y limit
Oceania: The score has ranged between 7.15 and 7.48, the distribution of the box is normal, median equal 7.26, there are no outliers.
The happiness score is correlated by other factors shown in the table :
| Factor | Correlation coefficient | Relation strength | Relation direction |
|---|---|---|---|
| GDP per capita | 0.77 | Strong | Positive |
| Social support | 0.7 | Strong | Positive |
| Healthy life expectancy at birth | 0.74 | Strong | Positive |
| Positive affect | 0.55 | Strong | Positive |
| Negative affect | -0.26 | Weak | Negative |
The interesting relationships between GDP per capita and Healthy of life expectancy at birth because in the correlation matrix it takes the highest correlation coefficient value with 0.85. GDP per capita is especially useful when comparing one country to another because it shows the relative performance of the countries. A rise in per capita GDP signals growth in the economy and tends to reflect an increase in productivity this increasing reflect on the quality of any service provided by the country to its citizen. The health is one of these important services.
Read more: https://www.investopedia.com/terms/p/per-capita-gdp.asp
GDP per capita and Healthy of life expectancy at birth
In this section we will answer this questions : Q4) How is the performance of happiness score in world continents over the years? Q5) How’s the distribution of the most factor effect on happiness score over the world continents? #Q4) How is the performance of happiness score in world continents over the year?
Now we separate the general line to many lines represent the Continents. This will help us to understand the years that have drops.
Africa The interesting interval is between 2010 to 2013 because this years have much event based on Arab Spring. The Arab Spring began in late 2010 in response to oppressive regimes and a low standard of living, beginning with protests in Tunisia.The effects of the Tunisian Revolution spread strongly to Libya and Egypt.Sustained street demonstrations took place in Morocco, ,Algeria, and Sudan.
Asia Between 2011 and 2013 Asia also have a problems instad of Arab Spring but the affect here is minor compared with Afreica. almost all problems solved in the end of 2011 along with governmental changes in Saudi Arabia ,Bahran, Jordan, Oman, Kuwait, and Palestinian.
Europe: The interesting interval is between 2008 and 2009 because the happiness score is decreasing by almost 1.2 points. I searched about this during the internet but I didn’t find a specific event happened in this year. Maybe the data is not enough or something happened before and the affect has appeared later this depends on 2007 data.
North America: At the blue line, we see the major effect comes from 2009 to 2011. When I searched about something interesting to understand the possible causes occurred there I find something but it’s lead us to investigate more about it ( did the financial crisis of 2007–2008 affect on the world happiness score as a long-term effect? ) maybe a consider it as a question in future work.
South America: The score is increased from 2008 and 2009 by 0.8, then it stays to 2013 then drop again in 2016. I didn’t find any interesting information but we should remember based on the missing data in the original dataset that we removed it may be they affect.
Oceania: The line here has tiny changes.This is the smallest continent based on the area and the number of countries. The mean of the score is absolutely affected by this number.
In General, the GDP per capita has a strong positive relation with happiness score.
Africa : The GDP per capita range is between 6 to 10, The relation is strong positive with happiness score, There is an outlier above 6.
Asia : The GDP per capita range is between 7 to 12, The relation is strong positive with happiness score. There is an outlier near 6.
Europe : The GDP per capita range is between 8.2 to 11.2, The relation is strong positive with happiness score. There is an outlier near between 5.5 and 6.
North America : The GDP per capita range is between 6.5 to 11, The relation is strong positive with happiness score.
Oceania: I expected this chart. the range is tiny but has a high value between 10 and 1.
South America: The GDP per capita range is between 8 to 10, The relation is strong positive with happiness score. there is an outlier at 4.
First, I looked at the trend of the happiness score in the world continents over the years. I found some interesting interval in Africa and Asia between 2010 to 2013. Then, I decided to choose GDP per capita because it has the strongest correlation with happiness score compared with other factors. So I want to understand the distribution of this relation in each continent.
I chose this chart because it gives us a full coverage of the relations among over data. This function covers the strength and the directions of this relationships.
In this chart, the strength and the direction of all relationships factors in our dataset. The matrix represents the strongest positive relation between GDP per capita and health of Healthy life expectancy at birth. The weakness positive relation between Positive affect with GDP per capita and health of Healthy life expectancy at birth. The strongest negative relationship between Negative affect from one side and Social support and Positive affect from another side. The weakness negative relation between Negative affect and Healthy life expectancy at birth.
I chose this chart because it’s a good starting point in your analysis to investigate the main factor over the years.
In general, the happiness score has the highest value at 2017 and lowest value at 2014. From 2008 to 2010 is increased by approximately 0.08 point. From 2010 to 2011 it rapidly dropped by almost 0.05 point. The possible reasons for that are the amount of the data so let us check about it.
The continent is the column that I created in my dataset. IS created it because I want to study happiness over the world. I thought it’s better when I grouping my data instead of study the full list of countries. So I choose this chart because it describes the distribution of the main factor in each group.
In general, Europ has the highest score and Africa has the lowest score. Asia has the largest range and Oceania has the lowest range. We should take into consideration the number of countries on each continent.
From this exploratory analysis, we observed the relationships between all factors and our main factor happiness score. In this project a learned a lot of things which gives me more confidence that I’m in the right way of learning data analysis.
I faced many Struggle with this data set. First, I was confused and don’t have any idea about some variables. Second, the data was not ready to analyze, so I spent more time to clean it. Third, when I looked to the data I decided to add a new category “Continent” but it takes a time when I try to find another data and merged with my data. Finally, the knitr report didn’t save from the first time and many errors occur.
Project successes even after the above-mentioned struggle because of the World Happiness Report 2018 have a lot of informations and analysis this helps me to take an inspiration to finish my project. The cleaning step was completed with good enough data. The correlation coefficient of the variables calculated easily.
I think the best way in the future work is to think about personal factors. For example, what is the difference between male and female? is the age reflect on happiness? what about the education level? or the amount of reserve money for each person?
https://dictionary.cambridge.org/dictionary/english/gdp-per-capita http://www.who.int/healthinfo/statistics/indhale/en/ http://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlation-matrix-using-ggplot2 http://www.sthda.com/english/wiki/ggplot2-axis-ticks-a-guide-to-customize-tick-marks-and-labels http://www.sthda.com/english/articles/32-r-graphics-essentials/128-plot-time-series-data-using-ggplot/ http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
Online RStudio https://labs.cognitiveclass.ai/ Markdown table generator https://www.tablesgenerator.com/markdown_tables#